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Article Info ABSTRACT 

Article history: Increasing efforts in the transportation system have recently improved driver 
; safety and reduced crash rates. Lack of attention and fatigue directly affect the 

Received Jan 5, 2022 driver's consciousness. Driver distraction is an essential driver-specific factor 
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be mitigated. This paper proposes facial detection to identify features and test 
anomalies’ prediction against drivers using stacked convolutional neural 
Keywords: network (CNN) layers. The proposed model used overlapping HAAR and 
stacked CNN features to identify classifications of eye areas, such as open or 
closed. In addition to the sliding query window's overall intensity information. 
The conventional HAAR function, which elevates the brightness of nearby 
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Driver distraction regions, is still preferable. This method considers current intelligent 
Feature extraction transportation system-based solutions to minimize distraction effects by 
Support vector machine continuously comparing with flexible thresholds. The experimental results are 


analyzed from accurate driving datasets. At 456 iterations, the results acquired 
over 80% accuracy, while loss is near zero. The implication of driver's risk 
tolerance is further explored in this manner. Several risks are connected to 
driving any type of transportation system. 
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1. INTRODUCTION 

Reducing road traffic injuries and fatalities have become an excellent global health emphasis. 
According to the current world health organization (WHO) [1], car accidents are expected to become the eighth 
major cause of death worldwide by 2030, especially among millennials. A GLOBAL status report on road 
safety (2018) has shown that this cost increases every year. In Malaysia, approximately 7,152 deaths had 
occurred in 2016, with 87% of fatalities being males and 13% females [2]. There were 27,613,120 road 
accidents involving various vehicles. 25,800,679 recorded accidents consisted of 12,677,041 motorized two 
and three-wheelers, 1,191,310 heavy trucks, 59,977 buses, and 561,154 other vehicles [2]. The number of 
motorway traffic deaths was estimated using data from the Royal Malaysian Police and the Department of 
Road Transport [3]. In 2013, there were 6,915 fatal accidents from registered cars, 6,674 in 2014, 6,706 in 
2015, 7,152 in 2016, and 6,740 in 2017. These figures exceed the baseline. The fatality rate had slightly reduced 
in 2017 from 1.42% in 2016 to 1.22%. This data indicates that the Malaysian Road Safety strategy should be 
more advanced to decrease these numbers. It is significant for all road safety stakeholders to identify the main 
cause of road traffic fatalities [3]. 

Given the severe impact of fatalities, the causes of road accidents must be investigated to alleviate 
concerns. Manufacturing vehicles should enhance traffic condition tracking and monitoring system through 
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intelligent surveillance since cross-functional activities present a challenge for everyone. With the rise of 

multitasking, distractions can be fatal for drivers, especially on the highway. Three variables are found to 

contribute to distractions among drivers: 

a. Sleepiness or fatigue: sleep deprivation or exhaustion may occur throughout a journey. In this scenario, the 
driver would not be in the proper state of mind to continue driving [4]. 

b. Outdoor distractions: the driver may engage in other activities such as texting or listening to loud music, 
mainly from the audio system [5]. 

c. Weather: the forecasted weather may become unpredictable and rapidly change. Choosing to drive can be 
hazardous. Strong thunderstorms or other weather phenomena (such as heavy fog) may significantly 
increase hazards on the road [6]. 

By extracting local features from visual data, convolutional layers reduce the number of variables to 
be studied [7]. Researchers have subsequently highlighted the latest method to address road accidents. With 
such distracting elements and the possibility for machine learning-based methods to detect driver distraction, 
these technologies are potentially promising solutions. Ersal et al. [8] a neural network (N.N.) model was 
combined with an overlapping HAAR classification algorithm to identify moments of driver distraction. 
Normal behaviour was defined as the instances when the driver is not performing any secondary task. Another 
approach, described by Wollmer et al. used general vehicle dynamics and driver head tracking information to 
model driver distraction based on long short-term memory (LTSM). The researchers created the framework to 
detect distracted driving behaviour [9]. Driver distraction detection was combined with adaptive safety systems 
in Iranmanesh et al. [10] to reduce false warnings while maintaining necessary ones. It is worth noting that 
their framework did not include any features related to the driver's eyes or face [10]. Artificial intelligence 
systems are the best methods for generating safety due to computational vision in handling emergency 
perspectives. Convolutional neural networks (CNNs) are a significant subset of artificial intelligence, as 
mentioned in the previous article [11], [12]. The performance accuracy will be higher with visual detection 
[13] of the driver's condition using CNN as input data in real-time for the forward collision warning (FCW) 
system. The forward-collision alert system can provide an early warning before collisions with an object in 
front of the vehicle [14]. Nevertheless, the problem faced by FCW in the vehicle is the issuance of false alerts 
or the inability to issue current warnings effectively. This situation will automatically lead drivers to turn the 
system off due to annoying alerts. Each driver has unique characteristics that drastically impact their decisions 
and reactions in different driving conditions. The particular driver's driving style depends on the instantaneous 
mental state, road condition, and traffic situation of the vehicle [15]-[17]. The research reveals several 
challenges and limitations regarding the FCW system. These are a self-learning algorithm [18], eye-tracking 
recognition and identification, and an adaptive driving assistance system. The key findings of this previous 
research show limitations regarding the eye closeness detection system such as blurring or high pixels on 
images [19], [20]. 

The only way to resolve this issue is to ensure that the driver's assistant system adapts to the driver 
itself. The system can adjust the control strategy to different driver characteristics. When the system's driver 
features can be complemented automatically by adding the learning results to the visual technique, the system 
can assist the driver's behaviours. In the early stages, research into this adaptive behaviour assistance method 
focuses mainly on the classical algorithm, such as sobel edge detection [21], support vector machine [22], 
WCN classification [23], CART method [24], standard ada-boost [25], viola-jones algorithm [26] and haar 
cascade classifier [27]. Advanced research utilizing a revolutionary algorithm detects and identifies driver 
distractions via the driving system's eye closeness visual feature. 

This research proposes a new methodology for developing an artificial intelligence system that 
determines drowsiness or fatigue among drivers using the stacked CNN framework as feature extraction and 
combined with overlapping HAAR. Its integration demonstrates the importance of comparing classification 
techniques for driver distraction while incorporating computer vision such as circumstances and conditions. 
This paper also focuses on validating the algorithm's ability to compute the system's accuracy and loss. This 
research further incorporates existing literature to develop a strategy that addresses the factors contributing to 
road accidents or fatalities from various perspectives, including the driver's eyes, and face. Our distraction 
detection module broadens the applicability of the proposed framework with reliance on in-vehicle cameras, 
which has not been previously covered in the literature. This study also investigates potential research that can 
be considered for planning road safety strategies and reducing false alarms that may trouble drivers. 


2. RESEARCH METHOD 
2.1. Cascaded object detection 

This phase portrays the classification of eye-closeness using a vision cascade object detection 
algorithm. In the first step, the classification of eye regions such as open or closed is identified using HAAR 
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features and the stacked CNN. HAAR cascade object detection is one of the well-known cascaded object 
detection algorithms that define the presence of parts like the eye, nose, and mouth, from a frontal face. It has 
essential functions that compute the face features using integral representation for an image. The critical image 
can be calculated from an image using a few operations per pixel. At a specified period, the HAAR features 
are computed at different locations. The two-rectangular feature value is the difference between the pixels of 
those rectangular regions. It is performed in both vertical and horizontal directions. An aggregate of any 
rectangle is computed in four array references on the integral image. It is obtained from eight references, and 
thus the adjacent rectangular sum is calculated from six array references. 


2.2. Stacked convolutional neural network 

After computing the eye region, the stacked CNN is deployed to classify whether the eye is closed or 
open. Deep CNN (DCNN) aims to convolute the image with its kernel functions to yield the feature map 
vectors. The estimated kernel weight is associated with each successor and predecessor layers unit. The weights 
determined during the training process are done at the convolutional layers. Feature vectors of each frame will 
be fed to CNN, and hence the training will be carried out. The functionality of CNN can be bifurcated into four 
key areas i) the HAAR-based eye feature vectors will be fed to the input layer, ii) the convolutional layer takes 
the features of neurons related to the local regions, and it computes the scalar product among the areas related 
to the neuron's weight, iii) then, the pooling layer helps to activate the parameters used for the down sampling 
process, and iv) the fully connected layer will then generate scores for the classes (from the activations) utilized 
for the classification process. Figure 1 illustrates the drowsiness detection using stacked models to develop eye 
closeness. The methodology contributes to the literature, that it combines deep neural network models with 
supervised machine learning to classify the drowsy condition. 


| Drowsiness data set collection | 


Vv 


| Detect the human eye and face 


Vv 


| Extract features using overlapping HAAR l 


Input image 


| Classification using stacked CNN | 


Eye image is closed 


No 


l Person in drowsy condition | 


Person in good condition | 


Figure 1. Flowchart of method system design 
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3. RESULTS AND DISCUSSION 

The warning generation process can be individualized by determining the driver's immediate risk tolerance 
and evaluating it to its standard tolerance for risk. Algorithm 1 represents this reasonable level, which may be the 
primary parameter of the recommended visual adaptive FCW algorithm. It explains the importance of alert 
generation based on the likely intensity of the hazard, referred to as the warning triggering threshold (Thwt). 
Consequently, a warning is generated whenever instantaneous eye image (EImage) falls below Thwt, signifying that 
the impact of risk is more significant than the driver's standard tolerance for risk level these days. 


Algorithm 1: Visual distraction adaptive-FCW framework (any time instant t) 
Subsystem for Threshold Improvements: 
if eye_image flag(t) = 1 then 
if warning flag(t-td) = 1 then 
if 
distracted_condition - warning triggered then 
end 
else 
if EImage (t) < Thea then 
cautious_condition - warning triggered 
end 
end 
else 
if distraction flag (t) = 1 then 
Do nothing - warning not_triggered and no_input_image 
else 
if warning flag (t-td) = 1 then 
Warning triggered but no_input_image 
end 
end 


The cautious deceleration threshold is the second threshold (Thcd). From the simulation, the adaptive 
FCW approach attempts to appropriately tune the Thwt based on the most recent eye image profile by 
simultaneously considering Thcd. The need for warning generation is confirmed only when the driver's eye 
threshold is less than Thed. 

The classifier's performance was evaluated according to three dimensions: responsiveness, selectivity, 
and consistency. Sensitivity quantifies the predicted output in response to an input change. The sensitivity 
indicates the proportion of clearly recognized true positives. It contrasts with selectivity, quantifying the 
proportion of successfully predicted true negatives. The relationship between the predicted and actual values 
is referred to as consistency. It indicates the degree to which the expected value is close to the real deal. The 
following equations were used to quantify the three factors: 


Sensitivity (% TruePositives i 
ensitivi = x 
E) TruePositives + FalseNegatives 


P ee TrueNegatives 100 
pecificity (%) = TrueNegatives + FlasePositives i 

A ‘ogee TruePositives + TrueNegatives 100 
ccuracy (100%) = TotalNumberofSamples > 


Figure 2 show that the validation of the accuracy and loos of closed eyes detection in real scenarios 
for all the training datasets is 84.74%. All evaluation training data are presented in the figures below. 
Figure 3 presents the intercorrelations among the eight dataset measures. It shows that 0.32 of the threshold is 
set up by the system and provides the warning of closed eyes. Figure 4 illustrates the receiver operating 
characteristic (ROC) curve by using classification between true positive rates versus false positive rate in the 
0.638 area under the curve. The lost validation percentage is near zero, with the maximum iterations of 456. 
The area under the curve is 0.684211, which is true-positive. True-negative is highly classified for the dataset. 
On the other hand, accuracy is from the eight dataset groups with 1,000 frames. The allocated data portions for 
the training and testing phases of CNN model construction are 80% and 20%, respectively. 

Figure 5 shows the driver's image underneath the open and closed eyes histogram. The images were 
adjusted from original to grayscale to obtain low intensity: (A) and (C) are authentic images in the range of 
1.5 x 10*to 1.4 x 10*at 178 pixels; (B) and (D) are images after conversion to grayscale at a lower number of 
pixels with intensity 6664 and 5892, respectively. Results reveal that when images are converted to grayscale, 
it will decrease the intensity, Zeger et al. [28] support this finding. Utilizing modern processors and parallel 


Bulletin of Electr Eng & Inf, Vol. 12, No. 1, February 2023: 365-372 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 Oo 369 


programming, it is possible to perform simple pixel-by-pixel processing on a megapixel image in milliseconds. 
Other operations require excessive time, such as facial recognition, image resizing, and mean-shift 
segmentation. When the processing time is necessary to process an image or extract valuable data, most systems 
should operate faster. 

The performance of all classes in the dataset is depicted in Table 1, which shows the confusion matrix for 
all classes. The classification gives the effectiveness of models that have been used. Accuracy, precision, and F-score 
values will be calculated using true negative (TN), true positive (TP), false positive (FP), and false negative (FN) 
values shown in Table 2. The accuracy value indicates how well the system can correctly categorize the data. In 
other words, the accuracy value compares the correctly classified data and the entire data set. The precision value is 
the proportion of correctly classified positive data categories to all positively classified data. As indicated in 
Table 3, the kappa coefficient is an additional measure of accuracy. A classification's kappa score indicates how 
close the final score was to the true value, given only the chance of success. It can transform a deal from 0 to 1. There 
is no similarity between the classified image and the reference image if the kappa coefficient equals 0. If the kappa 
coefficient equals 1, the image is classified and identical to the ground truth image. Consequently, the classification 
is more accurate with the higher the kappa coefficient. 
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Figure 2. Result of the training progress 
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Figure 3. The threshold for triggering a warning in the system 
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Figure 4. ROC curve for output 
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Figure 5. The driver's image in open and closed eyes histogram underneath 


Table 1. Confusion matrix for all classes 
predict predict predict predict. predict predict predict predict 


class class2 class3 class4 class5 class6 class7 class8 
Actual_class1 76 0 0 0 0 0 0 0 
Actual_class2 0 86 0 0 0 0 0 0 
Actual_class3 0 0 72 0 0 0 0 0 
Actual_class4 0 0 0 32 0 0 0 32 
Actual_class5 0 0 0 0 181 0 0 0 
Actual_class6 0 0 0 0 0 108 0 0 
Actual_class7 0 0 0 0 0 0 142 0 
Actual_class8 0 0 0 113 0 0 0 113 


Table 2. Multi-class confusion matrix output for TP, FP, FN, and TN 


True positive False positive False negative True negative 


Actual_class1 76 0 0 879 
Actual_class2 86 0 0 869 
Actual_class3 72 0 0 883 
Actual_class4 32 113 32 7718 
Actual_class5 181 0 0 774 
Actual_class6 108 0 0 847 
Actual_class7 142 0 0 813 
Actual_class8 113 32 113 697 


Table 3. Multi-class confusion matrix output 
Class Accur Error Accuracy Enrrorin Sensitivity Specificity Precision False Fl Matthews Kappa 


acy of of in total total positive score correlation 

single _single rate coefficient 
Class 1 1 0 0.07958 0 1 1 1 0 1 1 0.84084 
Class 2 1 0 0.09005 0 1 1 1 0 1 1 0.8199 
Class 3 1 0 0.07539 0 1 1 1 0 1 1 0.84921 
Class 4 0.5 0.5 0.03350 0.11832 0.5 0.87318 0.22069 0.1268 0.306 0.26003 0.79462 
Class 5 1 0 0.18953 0 1 1 1 0 1 1 0.62094 
Class 6 1 0 0.11309 0 1 1 1 0 1 1 0.77382 
Class 7 1 0 0.14869 0 1 1 1 0 1 1 0.70262 
Class 8 0.5 0.5 0.11832 0.03350 0.5 0.9561 0.77931 0.0438 0.609 0.5402 0.64089 


4. CONCLUSION AND RECOMMENDATION 

The paper's primary objective is to identify distraction aspects and technologies that address drivers' 
distractions on their characteristics and faces. As previously explained, numerous systems are present for image 
processing. They impressively stacked CNN for classification and overlapping HAAR features as an 
identifying yielded the highest value for supervised learning. Another significant discovery is that the method 
used to notice distractions can be further improved regarding classification, identification, and verification to 
obtain authentic and more prominent images as input datasets. This work proposes an adaptive framework for 
FCW that is driver distraction-aware. This method will decrease the number of false warnings, which can 
distract from important notifications. The CNN classification technique relies on a realistic dataset to reveal 
the driver's distraction status, focusing on the driver's face, eyes, and mouth. 
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The visual of closed eyes has been used as a proxy for the driver's tolerance for risk, with a threshold 
set to define when warnings should be generated. The entire entry is adaptively and continuously revised in 
response to driver distraction. Thus, a driver's perception of dangerous scenarios is captured based on the image 
profile of the eyes. This article also contributes to the literature by examining the driver's face, eyes, and mouth 
due to false warnings where the system cannot detect different drivers. Possible FCW innovations to be 
reviewed to incorporate diverse data sources and improve driver distraction management activities. Due to the 
significant quality gain achieved at a modest advancement in network specifications, combining methods has 
demonstrated considerable success in statistical signal training and extraction and classification problems. It is 
now possible to reduce the number of false alerts encountered in nearly all FCW. 

The percentage to reduce false positive and false negative is higher by adding the parameters in the 
system. Previous research had overlooked both parameters: driver characteristics (such as braking, steering, 
speedy turns, and signaling) and images from the face while the driver is tired. This combination of inputs will 
expand the existing system to be more innovative in line with evolving automation in the industry. It is 
anticipated that additional research into different methods and FCW will offer further data to support the 
proposed approach. 
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